A prosody-based approach to end-of-utterance detection that does not require speech recognition

نویسندگان

  • Luciana Ferrer
  • Elizabeth Shriberg
  • Andreas Stolcke
چکیده

In previous work we showed that state-of-the-art end-of-utterance detection (as used, for example, in dialog systems) can be improved significantly by making use of prosodic and/or language models that predict utterance endpoints, based on word and alignment output from a speech recognizer. However, using a recognizer in endpointing might not be practical in certain applications. In this paper we demonstrate that the improvements due to the prosodic knowledge can be realized largely without alignment information, i.e., without requiring a speech recognizer. A prosodic end-of-utterance detector using only speech/nonspeech detection output is still considerably more accurate and has lower latency than a baseline system based on pause-length thresholding.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model

Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....

متن کامل

Is the speaker done yet? faster and more accurate end-of-utterance detection using prosody

We examine the problem of end-of-utterance (EOU) detection for real-time speech recognition, particularly in the context of a human-computer dialog system. Current EOU detection algorithms use only a simple pause threshold for making this decision, leading to two problems. First, especially as speech-driven interfaces become more natural, users often pause inside utterances, resulting in a prem...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Prosody and Automatic Speech Recognition —- Why not yet a Success Story and where to go from here

We describe the different linguistic and paralinguistic functions of prosody, show how features can be computed that describe the prosodic marking of these functions, and how this knowledge can be used in an automatic speech understanding system. This is done in the context of the speech–to–speech translation system Verbmobil, where prosody is used to segment the user utterance and to find self...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003